`RecordBatch` normalization (flattening) #6758

ngli-me · 2024-11-20T03:10:29Z

Which issue does this PR close?

Closes #6369.

Rationale for this change

Adds normalization (flattening) for RecordBatch, with normalization via Schema. Based on pandas/pola-rs.

What changes are included in this PR?

Are there any user-facing changes?

…on pola-rs.

… iterative function for `RecordBatch`. Not sure which one is better currently.

ngli-me

I had some questions regarding the implementation of this, since the one example from PyArrow doesn't seem to clarify on the edge cases here. Normalizing the Schema seems fairly straight forward to me, I'm just not sure on

Whether the iterative or recursive approach is better (or something I missed)
If DataType::Struct is the only DataType that requires flattening. To me, it looks like that's the only one that can contained nested Fields.

(I'm also not sure if I'm missing something with unwrapping like a List<Struct>)

Any feedback/help would be appreciated!

arrow-array/src/record_batch.rs

arrow-schema/src/schema.rs

…ch the example from PyArrow.

arrow-array/src/record_batch.rs

…h-flatten

nglime added 2 commits November 18, 2024 14:11

Added set up for the example of flattening from pyarrow.

bbd7c8b

Logic for recursive normalizer with a base normalize function, based …

8abcd25

…on pola-rs.

ngli-me changed the title ~~Feature/record batch flatten~~ RecordBatch normalization (flattening) Nov 20, 2024

ngli-me changed the title ~~RecordBatch normalization (flattening)~~ RecordBatch normalization (flattening) Nov 20, 2024

Added recursive normalize function for Schema, and started building…

6bba7d3

… iterative function for `RecordBatch`. Not sure which one is better currently.

github-actions bot added the arrow Changes to the arrow crate label Nov 23, 2024

Built out a bit more of the iterative normalize.

55eb953

ngli-me commented Nov 23, 2024

View reviewed changes

arrow-array/src/record_batch.rs Outdated Show resolved Hide resolved

arrow-array/src/record_batch.rs Outdated Show resolved Hide resolved

arrow-schema/src/schema.rs Show resolved Hide resolved

ngli-me marked this pull request as ready for review November 23, 2024 19:03

ngli-me marked this pull request as draft November 23, 2024 23:30

nglime added 2 commits November 23, 2024 21:03

Fixed normalize function for RecordBatch. Adjusted test case to mat…

30d6294

…ch the example from PyArrow.

Added tests for Schema normalization. Partial tests for RecordBatch.

0ed979d

ngli-me commented Nov 25, 2024

View reviewed changes

arrow-array/src/record_batch.rs Outdated Show resolved Hide resolved

nglime added 2 commits November 24, 2024 21:54

Removed stray comments.

d9d08cd

Commenting out exclamation field.

d1b3260

ngli-me marked this pull request as ready for review November 25, 2024 04:02

nglime added 3 commits December 4, 2024 22:04

Merge remote-tracking branch 'upstream/main' into feature/record-batc…

a12082c

…h-flatten

Fixed test for RecordBatch.

7adda58

Formatting.

9c9c699

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`RecordBatch` normalization (flattening) #6758

`RecordBatch` normalization (flattening) #6758

ngli-me commented Nov 20, 2024 •

edited

Loading

ngli-me left a comment •

edited

Loading

RecordBatch normalization (flattening) #6758

Are you sure you want to change the base?

RecordBatch normalization (flattening) #6758

Conversation

ngli-me commented Nov 20, 2024 • edited Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

ngli-me left a comment • edited Loading

Choose a reason for hiding this comment

`RecordBatch` normalization (flattening) #6758

`RecordBatch` normalization (flattening) #6758

ngli-me commented Nov 20, 2024 •

edited

Loading

ngli-me left a comment •

edited

Loading